Národní úložiště šedé literatury Nalezeno 5 záznamů.  Hledání trvalo 0.00 vteřin. 
End-to-End Speech Recognition for Low-Resource Languages
Sokolovskii, Vladislav ; Schwarz, Petr (oponent) ; Karafiát, Martin (vedoucí práce)
The automatic speech recognition area has started to adopt end-to-end neural network solutions for creating speech recognizers. However, the data hunger nature of these types of systems allows for the creation of recognizers only for high-resource languages, such as English, Chinese or Spanish. In low-resource scenarios, some solutions which alleviate the data scarcity problem have to be developed. One of the most effective techniques for this is fine-tuning a pre-trained model. The problem with the existing approaches of fine-tuning is that the token set of target and source languages does usually differ. That is why previous multi-lingual transfer learning approaches required the output layer to be changed, or mixed tokens from different languages in the output layer, or use universal token sets, or have separate output layers per language. This is undesirable because the sharing across languages in this case latent and not controllable in the output space when the language-specific graphemes are disjoint. Therefore this work proposes to map the tokens to the common set before the beginning of the pre-training. The existing solution was a transliteration of the source language to the target one, the novel approach is romanization where the token set of the target language is romanized to match the English alphabet. Subsequently, the diacritics from the romanized hypotheses can be restored using an additional restoration model. This has the advantage of increasing sharing in the output grapheme space.
Neuronový strojový překlad pro jazykové páry s malým množstvím trénovacích dat
Filo, Denis ; Fajčík, Martin (oponent) ; Jon, Josef (vedoucí práce)
Táto práca sa zaoberá neurónovým strojovým prekladom pre tzv. low-resource jazyky. Cieľom bolo pomocou experimentov vyhodnotiť súčasné techniky a navrhnúť ich vylepšenia. Prekladové systémy v tejto práci využívali architektúru neurónových sietí transformer a boli natrénované pomocou frameworku Marian. Vybranými jazykovými pármi pre experimenty boli slovenčina s chorvátčinou a slovenčina so srbčinou. V experimentoch boli predmetom skúmania techniky transfer learning a semi-supervised learning.
End-to-End Speech Recognition for Low-Resource Languages
Sokolovskii, Vladislav ; Schwarz, Petr (oponent) ; Karafiát, Martin (vedoucí práce)
The automatic speech recognition area has started to adopt end-to-end neural network solutions for creating speech recognizers. However, the data hunger nature of these types of systems allows for the creation of recognizers only for high-resource languages, such as English, Chinese or Spanish. In low-resource scenarios, some solutions which alleviate the data scarcity problem have to be developed. One of the most effective techniques for this is fine-tuning a pre-trained model. The problem with the existing approaches of fine-tuning is that the token set of target and source languages does usually differ. That is why previous multi-lingual transfer learning approaches required the output layer to be changed, or mixed tokens from different languages in the output layer, or use universal token sets, or have separate output layers per language. This is undesirable because the sharing across languages in this case latent and not controllable in the output space when the language-specific graphemes are disjoint. Therefore this work proposes to map the tokens to the common set before the beginning of the pre-training. The existing solution was a transliteration of the source language to the target one, the novel approach is romanization where the token set of the target language is romanized to match the English alphabet. Subsequently, the diacritics from the romanized hypotheses can be restored using an additional restoration model. This has the advantage of increasing sharing in the output grapheme space.
Neuronový strojový překlad pro jazykové páry s malým množstvím trénovacích dat
Filo, Denis ; Fajčík, Martin (oponent) ; Jon, Josef (vedoucí práce)
Táto práca sa zaoberá neurónovým strojovým prekladom pre tzv. low-resource jazyky. Cieľom bolo pomocou experimentov vyhodnotiť súčasné techniky a navrhnúť ich vylepšenia. Prekladové systémy v tejto práci využívali architektúru neurónových sietí transformer a boli natrénované pomocou frameworku Marian. Vybranými jazykovými pármi pre experimenty boli slovenčina s chorvátčinou a slovenčina so srbčinou. V experimentoch boli predmetom skúmania techniky transfer learning a semi-supervised learning.
Tvorba závislostního korpusu pro jorubštinu s využitím paralelních dat
Oluokun, Adedayo ; Zeman, Daniel (vedoucí práce) ; Rosa, Rudolf (oponent)
The goal of this thesis is to create a dependency treebank for Yorùbá, a language with very little pre-existing machine-readable resources. The treebank follows the Universal Dependencies (UD) annotation standard, certain language-specific guidelines for Yorùbá were specified. Known techniques for porting resources from resource-rich languages were tested, in particular projection of annotation across parallel bilingual data. Manual annotation is not the main focus of this thesis; nevertheless, a small portion of the data was verified manually in order to evaluate the annotation quality. Also, a model was trained on the manual annotation using UDPipe.

Chcete být upozorněni, pokud se objeví nové záznamy odpovídající tomuto dotazu?
Přihlásit se k odběru RSS.